Using collocation segmentation to extract translation units in a phrase-based statistical machine translation system
نویسندگان
چکیده
This report evaluates the impact of using a novel collocation segmentation method for phrase extraction in the standard phrase-based statistical machine translation approach. The collocation segmentation technique is implemented simultaneously in the source and target side. The resulting collocation segmentation is used to extract translation units. Experiments are reported in the Spanish-toEnglish EuroParl task and promising results are achieved in translation quality.
منابع مشابه
Using collocation segmentation to extract translation units in a phrase-based statistical machine translation system Implementación de una segmentación estad́ıstica complementaria para extraer unidades de traducción en un sistema de traducción estad́ıstico basado en frases
This report evaluates the impact of using a novel collocation segmentation method for phrase extraction in the standard phrase-based statistical machine translation approach. The collocation segmentation technique is implemented simultaneously in the source and target side. The resulting collocation segmentation is used to extract translation units. Experiments are reported in the Spanish-toEng...
متن کاملIntegration of statistical collocation segmentations in a phrase-based statistical machine translation system
This study evaluates the impact of integrating two different collocation segmentations methods in a standard phrase-based statistical machine translation approach. The collocation segmentation techniques are implemented simultaneously in the source and target side. Each resulting collocation segmentation is used to extract translation units. Experiments are reported in the English-to-Spanish Bi...
متن کاملPhrase Segmentation Model using Collocation and Translational Entropy
In this paper, we propose a phrase segmentation model for the phrase-based statistical machine translation. We observed that good translation candidates generated by a conventional phrase-based SMT decoder have lexical cohesion and show more uniform translation for each phrase segment. Based on the observation, we propose a novel phrase segmentation model using collocation between two adjacent ...
متن کاملUsing Collocation Segmentation to Augment the Phrase Table
This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC in cooperation with BMIC and VMU. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation ove...
متن کاملUPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system
This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 45 شماره
صفحات -
تاریخ انتشار 2010